Accuracy vs. Comprehensibility in Data Mining Models

نویسندگان

  • Ulf Johansson
  • Lars Niklasson
  • Rikard König
چکیده

This paper addresses the important issue of the tradeoff between accuracy and comprehensibility in data mining. The paper presents results which show that it is, to some extent, possible to bridge this gap. A method for rule extraction from opaque models (Genetic Rule EXtraction – G-REX) is used to show the effects on accuracy when forcing the creation of comprehensible representations. In addition the technique of combining different classifiers to an ensemble is demonstrated on some well-known data sets. The results show that ensembles generally have very high accuracy, thus making them a good first choice when performing predictive data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building comprehensible customer churn prediction models with advanced rule induction techniques

Customer churn prediction models aim to detect customers with a high propensity to attrite. Predictive accuracy, comprehensibility, and justifiability are three key aspects of a churn prediction model. An accurate model permits to correctly target future churners in a retention marketing campaign, while a comprehensible and intuitive rule-set allows to identify the main drivers for customers to...

متن کامل

The Truth is In There - Rule Extraction from Opaque Models Using Genetic Programming

A common problem when using complicated models for prediction and classification is that the complexity of the model entails that it is hard, or impossible, to interpret. For some scenarios this might not be a limitation, since the priority is the accuracy of the model. In other situations the limitations might be severe, since additional aspects are important to consider; e.g. comprehensibilit...

متن کامل

Why Not Use an Oracle When You Got One?

The primary goal of predictive modeling is to achieve high accuracy when the model is applied to novel data. For certain problems this requires the use of complex techniques like neural networks or ensembles, resulting in opaque models that are hard or impossible to interpret. For some domains this is unacceptable, since models need to be comprehensible. To achieve comprehensibility, accuracy i...

متن کامل

Tuve Löfström & Ulf Johansson

When performing data mining, the selection of data mining technique is a critical decision. Often this choice boils down to whether a transparent model is needed or not. Most research indicates that techniques producing transparent models, such as decision trees, often have an inferior accuracy compared to techniques such as neural networks. On the other hand, models created by neural networks ...

متن کامل

A Novel Genetic Algorithm Based Method for Building Accurate and Comprehensible Churn Prediction Models

Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly focused on the accuracy aspect of the constructed churn models. However, in addition to the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004